Classifying Steam’s Successful Games

Michael Yang, Edward Yang, Sanjae Chin

Topic and Motivation (What Is Steam?)

  • Largest digital distribution platform and storefront
  • Thousands of games are listed on Steam but only a fraction are successful
  • Key player in the video game industry
  • Our Research Topic:
    • Aim to find important features that classify success

Steam Icon

Our Data

Where we got our data from:

SteamDB
Official Steam Site
SteamCharts
Scraped on February 28, 2024

Our key variables:

Genres
Publisher/Developer
Base Price
Success
Number of Positive Reviews
Number of Negative Reviews
Total Number of Reviews
Peak Player Count
Peak Daily Player Count

EDA Visualizations

  • About 18% (598) of games in our cleaned dataset are considered successful
  • About 70% (421) of successful games cost less than $20
    • About 21% (128) were free
  • The average positive/negative review ratio is around 95%
  • Number of successful and non-successful games

    Price distribution for successful/non-successful games Positive/Negative (log) review ratio distribution for successful/non-successful games

    EDA Genres

  • Most successful games are Indie (64%)
  • A decent number of games have Action-Adventure aspects (41%)
  • Indie Action-Adventure games are more likely to be successful compared to other genres
  • Genre distribution

    Methodology

    • Logistic Regression
      • simplicity, interpretability, linearity
    • Support Vector Machine (SVM)
      • effectiveness in handling high-dimensional data and complex decision boundaries
    • Random Forest
      • Ability to handle non-linear relationships
    • Gradient Boosting Machine (GBM)
      • Iterative improvement of model performance

    Results

    We then use ROC_AUC score to compare model performance of each model on the test dataset, which provides insights into the performance of each model in classifying success

    • ROC_AUC for SVM is 0.88
    • ROC_AUC for LR is 0.92
    • ROC_AUC for RF is 1.0
    • ROC_AUC for GBM is 0.5

    Conclusions and Discussion

    Currently our model is generally doing “well” to classify success based on our numerical columns(“BASE PRICE”, “ALL TIME PEAK”, “TOTAL REVIEWS”,“NEGATIVE REVIEWS”). However once we added genre columns to our model auc scores were always 1.0

    We think this possibly due to two reasons:

  • Our change in what is success is now too simply
  • There is some 1-1 mapping of our genre columns to outcome (success)
  • Going forward we want to find where this could be originating from.

    Depending on what we uncover
  • We may try to change our definition of success
  • if the origin seems interesting we may analyze it further